Bitcoin Price Analysis and Forecasting: Volatility Insights, Time Series Modeling, and Visualization¶
Overview¶
Bitcoin, the pioneering cryptocurrency, has ignited global interest due to its intriguing price fluctuations and potential impact on the financial landscape. This project is designed to provide a comprehensive exploration of Bitcoin's price behavior, encompassing historical trends, volatility patterns, and future price predictions.
The foundation of this analysis rests on historical Bitcoin price data, meticulously collected from the CryptoCompare API. This dataset comprises of hourly Bitcoin price data starting from July 23, 2023, 23:00:00 and going back two years. The dataset encompasses a rich variety of information, including daily opening, closing, high, and low prices, along with corresponding trading volumes in Bitcoin and US Dollars. This comprehensive dataset serves as the bedrock for our in-depth analysis and forecasting endeavors.
Table of Contents
- 1 Exploratory Data Analysis (EDA)
- 1.1 Finding maximum and minimum values of each column with the date and time that they happened.
- 1.2 Highest and lowest trading volumes from Bitcoin
- 1.3 The correlation between the trading volume and the Bitcoin closing price
- 1.4 Calculating and Visualizing Returns
- 1.5 Rolling Statistics
- 1.6 Seasonal Decomposition
- 2 Volatility Analysis
- 3 Time-Series Modeling and Forecasting
import time
import pandas as pd
import datetime
pd.options.mode.chained_assignment = None
# import requests
## API_KEY = key removed for privacy
# CRYPTO_SYMBOL = 'BTC' # Bitcoin symbol
# CURRENCY = 'USD' # Currency to convert prices into
# LIMIT = 2000 # Maximum limit per API call
# # Calculate the timestamp of today and one year ago (in seconds)
# today_timestamp = int(pd.to_datetime('2023-07-23 23:00:00').timestamp())
# one_year_ago_timestamp = int(pd.to_datetime('2021-07-23 23:00:00').timestamp())
# # Create an empty list to store the data points
# data_points = []
# # Number of data points to fetch for 1 year
# num_data_points = 2*365 * 24 # 365 days * 24 hours
# # Keep fetching data in batches until we get the required number of data points
# while len(data_points) < num_data_points:
# # Calculate the number of data points to fetch in this batch
# remaining_data_points = num_data_points - len(data_points)
# batch_limit = min(remaining_data_points, LIMIT)
# # Make the API call
# url = f'https://min-api.cryptocompare.com/data/v2/histohour?fsym={CRYPTO_SYMBOL}&tsym={CURRENCY}&limit={batch_limit}&toTs={today_timestamp}&api_key={API_KEY}'
# response = requests.get(url)
# if response.status_code == 200:
# batch_data = response.json()['Data']['Data']
# data_points.extend(batch_data)
# # Update 'today_timestamp' for the next batch
# today_timestamp -= (batch_limit * 3600) # 1 hour = 3600 seconds
# else:
# print(f'Error: Unable to retrieve data. Status code: {response.status_code}')
# break
# # Create a DataFrame from the list of data points
# df = pd.DataFrame(data_points)
# df['time'] = pd.to_datetime(df['time'], unit='s') # Convert timestamps to datetime format
# df.set_index('time', inplace=True)
# df.sort_index(inplace=True)
# df.to_csv('G:/Documents/Projects/Bitcoin/data/bitcoin_data.csv')
bitcoin_data = pd.read_csv('G:/Documents/Projects/Bitcoin/data/bitcoin_data.csv', index_col = 'time')
bitcoin_data.shape
(17521, 8)
bitcoin_data
| high | low | open | volumefrom | volumeto | close | conversionType | conversionSymbol | |
|---|---|---|---|---|---|---|---|---|
| time | ||||||||
| 2021-07-24 07:00:00 | 33946.07 | 33773.00 | 33910.18 | 387.98 | 13132107.01 | 33836.30 | direct | NaN |
| 2021-07-24 08:00:00 | 33846.75 | 33573.30 | 33836.30 | 666.07 | 22439901.91 | 33641.94 | direct | NaN |
| 2021-07-24 09:00:00 | 33880.44 | 33634.74 | 33641.94 | 631.73 | 21322080.78 | 33858.19 | direct | NaN |
| 2021-07-24 10:00:00 | 33956.63 | 33783.89 | 33858.19 | 724.58 | 24555816.77 | 33907.62 | direct | NaN |
| 2021-07-24 11:00:00 | 34044.97 | 33716.29 | 33907.62 | 832.28 | 28204862.68 | 33892.86 | direct | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-07-23 19:00:00 | 30343.99 | 30024.00 | 30255.16 | 1177.25 | 35575032.62 | 30090.59 | direct | NaN |
| 2023-07-23 20:00:00 | 30145.79 | 30090.10 | 30090.59 | 291.29 | 8772899.69 | 30144.70 | direct | NaN |
| 2023-07-23 21:00:00 | 30144.70 | 29934.57 | 30144.70 | 511.14 | 15339398.54 | 29949.33 | direct | NaN |
| 2023-07-23 22:00:00 | 30038.25 | 29948.23 | 29949.33 | 254.28 | 7629821.07 | 30017.73 | direct | NaN |
| 2023-07-23 23:00:00 | 30093.06 | 30016.82 | 30017.73 | 316.19 | 9505793.11 | 30085.53 | direct | NaN |
17521 rows × 8 columns
bitcoin_data.info()
<class 'pandas.core.frame.DataFrame'> Index: 17521 entries, 2021-07-24 07:00:00 to 2023-07-23 23:00:00 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 high 17521 non-null float64 1 low 17521 non-null float64 2 open 17521 non-null float64 3 volumefrom 17521 non-null float64 4 volumeto 17521 non-null float64 5 close 17521 non-null float64 6 conversionType 17521 non-null object 7 conversionSymbol 6 non-null object dtypes: float64(6), object(2) memory usage: 1.2+ MB
bitcoin_data.drop(columns = ['conversionSymbol', 'conversionType'], inplace = True)
# Convert the 'datetime' column to a pandas datetime object and set it as the index
bitcoin_data.index = pd.to_datetime(bitcoin_data.index)
bitcoin_data = bitcoin_data.drop_duplicates(keep='first')
Exploratory Data Analysis (EDA)¶
Exploratory Data Analysis is crucial for understanding the characteristics of your data and identifying patterns or anomalies. Here are some EDA steps you can perform:
Visualize Historical Prices: Plot the historical Bitcoin prices over time using line plots or candlestick charts. Observe any trends, seasonality, or notable events.
Calculate and Visualize Returns: Compute the percentage returns from the price data and plot them. Analyze the distribution of returns and look for patterns.
Rolling Statistics: Compute rolling statistics, such as moving averages and rolling standard deviations, to observe trends and volatility changes.
Seasonal Decomposition: Use seasonal decomposition techniques (e.g., seasonal decomposition of time series, or STL) to separate the data into trend, seasonality, and residual components.
Autocorrelation and Partial Autocorrelation: Analyze autocorrelation and partial autocorrelation plots to identify potential autoregressive (AR) and moving average (MA) components for time-series modeling.
import matplotlib.pyplot as plt
import seaborn as sns
import mplfinance as mpf
import numpy as np
import plotly.graph_objects as go
Finding maximum and minimum values of each column with the date and time that they happened.¶
# Set the display format for floating-point numbers
pd.options.display.float_format = '{:.2f}'.format
# Create a new DataFrame to show the maximum value and its index beside it
max_values_df = pd.DataFrame({
'Max Value': bitcoin_data.max(),
'Date of Max Value': bitcoin_data.idxmax()
})
max_values_df
| Max Value | Date of Max Value | |
|---|---|---|
| high | 68978.64 | 2021-11-10 14:00:00 |
| low | 68464.62 | 2021-11-10 17:00:00 |
| open | 68624.18 | 2021-11-10 18:00:00 |
| volumefrom | 80010.58 | 2022-06-11 08:00:00 |
| volumeto | 2926103305.83 | 2021-10-27 05:00:00 |
| close | 68624.18 | 2021-11-10 17:00:00 |
Let's check the market in the week leading to that all time high price on 2021-11-10 14:00:00.
highest_time = pd.to_datetime('2021-11-10 14:00:00')
# Calculate the datetime for a week before 't0'
week_before_highest = highest_time - pd.Timedelta(weeks=1)
# Get the datetimes within the week before 't0' using date_range
datetimes_week_before_highest = pd.date_range(start=week_before_highest, end=highest_time, freq='H')
# Access the corresponding rows in the DataFrame
data_week_before_highest = bitcoin_data.loc[datetimes_week_before_highest]
plt.figure(figsize=(9, 6))
sns.lineplot(x=data_week_before_highest.index, y=data_week_before_highest.high, color='blue')
plt.title('High Values for the Week Before all-time high')
plt.xlabel('Datetime')
plt.ylabel('High Value')
plt.grid(True)
plt.tight_layout()
plt.show()
Why did Bitcoin rise in November 2021?
Bitcoin (BTC) price again reached an all-time high in 2021, as values exceeded over 65,000 USD in November 2021. That particular price hike was connected to the launch of a Bitcoin ETF in the United States, whilst others in 2021 were due to events involving Tesla and Coinbase, respectively
A Bitcoin ETF (Exchange-Traded Fund) is a type of investment fund that tracks the price of Bitcoin (BTC) and aims to replicate its performance. ETFs are similar to mutual funds but are traded on stock exchanges like individual stocks. This means investors can buy and sell shares of a Bitcoin ETF throughout the trading day, just like any other stock.
The primary objective of a Bitcoin ETF is to provide investors with exposure to the price movements of Bitcoin without having to own the actual cryptocurrency. Instead of directly buying and holding Bitcoin, investors can buy shares of the ETF, which represent ownership of a portfolio of Bitcoin or Bitcoin futures contracts held by the ETF issuer.
# Create a new DataFrame to show the maximum value and its index beside it
min_values_df = pd.DataFrame({
'Min Value': bitcoin_data.min(),
'Date of Min Value': bitcoin_data.idxmin()
})
min_values_df
| Min Value | Date of Min Value | |
|---|---|---|
| high | 15746.25 | 2022-11-22 08:00:00 |
| low | 15480.69 | 2022-11-21 21:00:00 |
| open | 15631.36 | 2022-11-21 22:00:00 |
| volumefrom | 77.80 | 2023-07-23 11:00:00 |
| volumeto | 2326693.14 | 2023-07-23 11:00:00 |
| close | 15631.36 | 2022-11-21 21:00:00 |
The slump in November 2022 was triggered by the collapse of FTX, which handled around \$1 billion transactions each day. Its collapse is having a knock-on effect on other crypto exchanges. In June 2022 bitcoin dropped below \$20,000 for the first time since 2020.
On November 11, 2022, FTX announced Bankman-Fried's resignation as CEO of FTX, his predecessor, John J. Ray III, and the company's bankruptcy filing via Twitter.
# Create an interactive line plot with zooming
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', line=dict(color='dodgerblue')))
fig.update_layout(title='Hourly Bitcoin Closing Price',
xaxis_title='Date',
yaxis_title='Price')
fig.show()
Because there are so many data points, before plotting candlestick charts, I will resample the data to weekly frequency and aggregate using 'ohlc' (Open, High, Low, Close)
bitcoin_data_weekly = bitcoin_data.resample('W').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volumefrom': 'sum', 'volumeto': 'sum'})
# Candlestick chart
mc = mpf.make_marketcolors(up='g', down='r', wick='inherit', volume='inherit')
s = mpf.make_mpf_style(marketcolors=mc)
fig_size = (12, 8) # Adjust width and height as needed
mpf.plot(bitcoin_data_weekly, type='candle', style=s, title='Weekly Bitcoin Prices (Candlestick Chart)', figsize=fig_size)
Highest and lowest trading volumes from Bitcoin¶
Volume From: "Volume From" represents the total trading volume of the base cryptocurrency (in this case, Bitcoin) in a specific trading pair. It indicates the total amount of the base cryptocurrency that has been traded during the given time period.
Volume To: "Volume To" represents the total trading volume of the quote currency (in this case, USD) in a specific trading pair. It indicates the total amount of the quote currency that has been traded during the given time period.
For example, let's say you have a trading pair BTC/USD, where Bitcoin (BTC) is the base currency, and the US Dollar (USD) is the quote currency. If the reported values for the trading pair are:
- Volume From (BTC): 100 BTC
- Volume To (USD): 2,000,000 USD
This means that during the specified time period, 100 Bitcoin (BTC) has been traded against the US Dollar (USD), and the total value of those trades is $2,000,000 USD.
The trading volume is an essential metric in cryptocurrency markets, as it provides insights into the liquidity and trading activity of a specific cryptocurrency pair. High trading volumes are often associated with more liquid markets, which can be beneficial for traders and investors.
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_data.index, bitcoin_data['volumefrom'], label='Volume From', color='blue')
plt.title('Bitcoin Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume from Bitcoin')
plt.grid(True)
plt.show()
print("Highest trading volume from Bitcoin was {vol} and happened on {date}."
.format(vol=bitcoin_data.volumefrom.max(),date=bitcoin_data.volumefrom.idxmax()))
print("Lowest trading volume from Bitcoin was {vol} and happened on {date}."
.format(vol=bitcoin_data.volumefrom.min(),date=bitcoin_data.volumefrom.idxmin()))
Highest trading volume from Bitcoin was 80010.58 and happened on 2022-06-11 08:00:00. Lowest trading volume from Bitcoin was 77.8 and happened on 2023-07-23 11:00:00.
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_data.index, bitcoin_data['volumeto'], label='Volume To', color='orange')
plt.title('Bitcoin Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume To USD')
plt.grid(True)
plt.show()
print("Highest trading volume to USD was {vol} and happened on {date}."
.format(vol=bitcoin_data.volumeto.max(),date=bitcoin_data.volumeto.idxmax()))
print("Lowest trading volume to USD was {vol} and happened on {date}."
.format(vol=bitcoin_data.volumeto.min(),date=bitcoin_data.volumeto.idxmin()))
Highest trading volume to USD was 2926103305.83 and happened on 2021-10-27 05:00:00. Lowest trading volume to USD was 2326693.14 and happened on 2023-07-23 11:00:00.
The correlation between the trading volume and the Bitcoin closing price¶
import plotly.express as px
correlation = bitcoin_data['volumefrom'].corr(bitcoin_data['close'])
fig = px.scatter(bitcoin_data, x='volumefrom', y='close', opacity=0.5,
title='Correlation between Trading Volume and Bitcoin Closing Price',
labels={'volumefrom': 'Trading Volume', 'close': 'Closing Price'})
fig.update_layout(showlegend=False)
fig.show()
print("Correlation between Trading Volume and Closing Price:", correlation)
Correlation between Trading Volume and Closing Price: -0.1639708766072075
The negative correlation indicates that as trading volume increases, the closing price tends to decrease.
We see that the points are clustered to the left of the plot. This may indiciate:
Higher Trading Activity at Lower Prices: The clustering of points to the left suggests that there is a concentration of higher trading activity (volume) when the Bitcoin price is relatively lower. This could mean that more traders are actively buying and selling Bitcoin when its price is in a specific range.
Key Price Levels: The clustering might indicate that there are certain key price levels or support/resistance levels where traders tend to engage in more buying and selling activities, leading to higher trading volumes. These levels could be significant for traders in their decision-making process.
Price Stability: The clustering might also reflect periods of price stability or consolidation, where the price is trading within a narrow range. During such periods, trading activity may be more pronounced as traders try to capitalize on potential price movements.
Calculating and Visualizing Returns¶
# Calculating hourly percentage returns from the 'close' prices
bitcoin_data['Returns'] = bitcoin_data['close'].pct_change() * 100
Calculating hourly returns for financial data like Bitcoin prices might not be as meaningful as calculating daily or longer-term returns. Here's why:
Noise and Volatility: Hourly price movements in financial markets can be highly volatile and noisy due to factors such as high-frequency trading, news releases, and liquidity issues. This can lead to an excessive amount of noise in hourly returns, making it difficult to discern meaningful patterns.
Smoothing: Calculating returns at a higher frequency, like hourly, can lead to more pronounced short-term fluctuations that might obscure underlying trends. Longer timeframes like daily or weekly returns can help smooth out some of this noise, making it easier to identify trends and patterns.
Data Volume: Analyzing hourly returns might result in a large volume of data, which can make it challenging to draw meaningful insights and can lead to overfitting when developing forecasting models.
# Resample the data to daily intervals and calculate the daily closing prices
bitcoin_close_daily = bitcoin_data['close'].resample('D').last()
# Calculate the percentage daily returns
daily_returns = bitcoin_close_daily.pct_change() * 100
# Resample the data to weekly intervals and calculate the weekly closing prices
bitcoin_close_weekly = bitcoin_data['close'].resample('W').mean()
# Calculate the percentage daily returns
weekly_returns = bitcoin_close_weekly.pct_change() * 100
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=daily_returns, mode='lines', line=dict(color='dodgerblue'), name = 'Daily Returns'))
fig.add_trace(go.Scatter(x=bitcoin_close_weekly.index, y=weekly_returns, mode='lines', line=dict(color='blue'), name = 'Weekly Returns'))
fig.update_layout(title='Bitcoin Daily and Weekly Percentage Returns',
xaxis_title='Date',
yaxis_title='Daily/Weekly Percentage Returns')
fig.show()
Rolling Statistics¶
# Compute the Weekly rolling mean and standard deviation
# Data is hourly, so if we want to calculate 30 day rolling means the window size must be multiplied by 24 hours
bitcoin_data['Weekly Rolling Mean'] = bitcoin_data['close'].rolling(window=7*24).mean()
bitcoin_data['Weekly Rolling Std'] = bitcoin_data['close'].rolling(window=7*24).std()
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', line=dict(color='dodgerblue'),
name='Bitcoin Price'))
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', line=dict(color='orange'),
name='Weekly Rolling Mean'))
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Std'], mode='lines', line=dict(color='lightgray'),
name='Weekly Rolling Std', fill='tonexty'))
fig.update_layout(title='Bitcoin Price with Weekly Rolling Mean and Std',
xaxis_title='Date',
yaxis_title='Price',
xaxis=dict(showgrid=True),
yaxis=dict(showgrid=True),
showlegend=True,
xaxis_rangeslider_visible=True)
fig.show()
Seasonal Decomposition¶
import statsmodels.api as sm
# Perform seasonal decomposition
decomposition = sm.tsa.seasonal_decompose(bitcoin_data['close'], model='additive', period=7*24)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
# Create line plots for trend, seasonality, and residuals
plt.figure(figsize=(10, 8))
plt.subplot(4, 1, 1)
plt.plot(bitcoin_data.index, bitcoin_data['close'], label='Original')
plt.title('Original Bitcoin Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)
plt.subplot(4, 1, 2)
plt.plot(bitcoin_data.index, trend, label='Trend', color='orange')
plt.title('Trend Component')
plt.xlabel('Date')
plt.ylabel('Trend')
plt.grid(True)
plt.subplot(4, 1, 3)
plt.plot(bitcoin_data.index, seasonal, label='Seasonal', color='green')
plt.title('Seasonal Component')
plt.xlabel('Date')
plt.ylabel('Seasonal')
plt.grid(True)
plt.subplot(4, 1, 4)
plt.plot(bitcoin_data.index, residual, label='Residual', color='red')
plt.title('Residual Component')
plt.xlabel('Date')
plt.ylabel('Residual')
plt.grid(True)
plt.tight_layout()
plt.show()
The Sesonal is overplotted; to see the seasonality we can either change the period of the decomposition, or plot a zoomable figure.
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=seasonal, mode='lines', line=dict(color='green'),
name='Seasonal'))
fig.update_layout(title='Zoomable Seasonality Plot',
xaxis_title='Date',
yaxis_title='Value',
xaxis_rangeslider_visible=True)
fig.show()
At first glance, the Trend and Residual components look almost identical to Weekly Rolling mean and Std.
Let's plot each and compare.
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=trend, mode='lines', line=dict(color='dodgerblue'),
name='Trend'))
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', line=dict(color='orange'),
name='Weekly Rolling Mean'))
fig.update_layout(title='Trend and Weekly Rolling Mean',
xaxis_title='Date',
yaxis_title='Value',
xaxis=dict(showgrid=True),
yaxis=dict(showgrid=True),
showlegend=True,
xaxis_rangeslider_visible=True)
fig.show()
The trend component represents the underlying long-term movement of the data, while the rolling average is a way to smooth out short-term fluctuations and highlight the general trend.
The fact that the trend component and the Weekly rolling average are very similar shows that the seasonality is relatively stable over time, and the seasonal decomposition method effectively captures it.
The seasonal decomposition method we used here is the "additive" model, where the observed data is considered as the sum of the trend, seasonal, and residual components. In this model, the trend component tends to show a linear or linear-like pattern, and it may be similar to the rolling average.
Since standard deviation can only have positive values, we take the absolute value of residual to make the comparison easier.
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=np.abs(residual), mode='lines', line=dict(color='dodgerblue'),
name='Trend'))
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Std'], mode='lines', line=dict(color='yellow'),
name='Weekly Rolling Std'))
fig.update_layout(title='Residual and Weekly Rolling Std',
xaxis_title='Date',
yaxis_title='Value',
xaxis=dict(showgrid=True),
yaxis=dict(showgrid=True),
showlegend=True,
xaxis_rangeslider_visible=True)
fig.show()
The residual component represents the part of the data that cannot be explained by the trend and seasonality. It is essentially the leftover variation in the data after the trend and seasonality have been removed. On the other hand, the rolling standard deviation is a measure of how much the data deviates from its average value over a rolling window of 30 days.
Here the seasonal decomposition method effectively captures the trend and seasonality in the data, since the residual component should ideally contain random noise and irregular fluctuations. Therefore the residual component is similar to the rolling standard deviation, which captures the dispersion of the data around its mean.
Both the residual component and the rolling standard deviation provide insights into the volatility or variability of the data. A close similarity between the two may indicate that the data has relatively stable variability around the trend and seasonality.
Volatility Analysis¶
Volatility refers to how much variation there is in consecutive price changes over time.
Daily Volatility¶
Daily volatility measures the price changes of an asset on a daily basis. It can be calculated as the absolute daily returns or the standard deviation of daily returns. Using returns to measure volatility is a common practice in financial analysis, and it can provide valuable insights into the volatility of an asset's price movements. Volatility refers to the degree of variation of an asset's price over time, and returns are a key component in calculating and assessing this volatility.
Volatility Clustering refers to the tendency for periods of high volatility to be followed by more periods of high volatility and vice versa. This insight can help us understand how volatility changes over time and potentially predict periods of increased market activity.
By analyzing the relationship between volatility and market trends we see that high volatility coincides with major market events, such as price spikes or crashes. By examining volatility patterns alongside price movements, we can gain insights into how external factors impact the market.
import plotly.graph_objects as go
fig = go.Figure()
# Create the first trace (Daily Returns) using the primary y-axis
fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=daily_returns, mode='lines', line=dict(color='dodgerblue'), name='Daily Returns'))
# Create the second trace (Daily Closing Prices) using the secondary y-axis
fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=bitcoin_close_daily, mode='lines', line=dict(color='orange'), name='Daily Closing Prices', yaxis='y2'))
# Set up the layout with two y-axes
fig.update_layout(title='Bitcoin Daily Returns and Closing Prices',
xaxis_title='Date',
yaxis_title='Daily Returns',
yaxis2=dict(title='Daily Closing Prices', overlaying='y', side='right'),
legend=dict(x=0, y=1, bgcolor='rgba(255, 255, 255, 0.5)'),
)
fig.show()
As can be seen in this plot, spikes and crashes of the market coincide with periods of high volatility.
Historical Volatility¶
Historical volatility measures the past price fluctuations of an asset over a specific period. It is typically computed as the standard deviation of the asset's returns. Higher historical volatility implies greater price variability in the past.
The debate over annualizing volatility—traditionally done with 252 trading days—takes on a new dimension with assets like Bitcoin, which trades around the clock. While 252 days align with standard market hours, the continuous trading of cryptocurrencies prompts consideration for using 365 days instead. This acknowledges the ceaseless nature of Bitcoin's trading activity, capturing its year-round price movements. The choice between the two approaches depends on analysis context and the need to adapt methodologies to the unique dynamics of cryptocurrency markets.
# Annualizing using 365 trading days in a year
historical_volatility = bitcoin_data['Returns'].rolling(window=7*24).std() * (365 ** 0.5)
# Assuming you have already calculated historical volatility as 'historical_volatility'
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plot Bitcoin prices on the primary y-axis (left side)
ax1.plot(bitcoin_data.index, bitcoin_data['close'], label='Bitcoin Price', color='blue')
ax1.set_xlabel('Date')
ax1.set_ylabel('Bitcoin Price', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
ax1.grid(True)
# Create a secondary y-axis for historical volatility (right side)
ax2 = ax1.twinx()
ax2.plot(bitcoin_data.index, historical_volatility, label='Historical Volatility', color='orange')
ax2.set_ylabel('Historical Volatility', color='orange')
ax2.tick_params(axis='y', labelcolor='orange')
# Add a legend that combines both lines from both y-axes
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')
plt.title('Bitcoin Price and Historical Volatility')
plt.grid(True)
plt.tight_layout()
plt.show()
Volatility Indicators:¶
There are various volatility indicators that can help identify trends or changes in volatility. Some popular volatility indicators include the Bollinger Bands, Average True Range (ATR), and the Volatility Index (VIX).
Bollinger Bands¶
Bollinger Bands are a popular technical indicator that helps traders and analysts understand price volatility and potential trading signals. They were developed by John Bollinger in the 1980s and consist of three lines plotted on a price chart:
Middle Band: The middle band is a simple moving average (SMA) of the asset's price over a specified period. The most commonly used period is 20 days, but you can adjust it based on your analysis objectives.
Upper Band: The upper band is derived by adding a specified number of standard deviations (usually 2) to the middle band. The standard deviation is a measure of the asset's price volatility. The upper band represents a zone where prices are relatively high.
Lower Band: The lower band is derived by subtracting a specified number of standard deviations (usually 2) from the middle band. The lower band represents a zone where prices are relatively low.
Bollinger Bands can be used for:
Volatility Assessment: Bollinger Bands provide a visual representation of market volatility. When the bands are wide, it indicates higher volatility, and when they are narrow, it indicates lower volatility.
Overbought and Oversold Levels: Traders often look for potential buying opportunities when the price touches or crosses the lower band, as it suggests that the asset may be oversold. Similarly, potential selling opportunities are sought when the price touches or crosses the upper band, as it suggests that the asset may be overbought.
Price Breakouts: Bollinger Bands can help identify potential breakouts. A breakout occurs when the price moves outside the bands. Traders may interpret a breakout as a signal to enter or exit positions.
# For example, using Bollinger Bands to identify volatility bands around the moving average
upper_band = bitcoin_data['Weekly Rolling Mean'] + 2 * bitcoin_data['Weekly Rolling Std']
lower_band = bitcoin_data['Weekly Rolling Mean'] - 2 * bitcoin_data['Weekly Rolling Std']
Identifying overbought and potential price correction datetimes¶
# Identify overbought datetimes using the upper Bollinger Band
overbought_datetimes = bitcoin_data[bitcoin_data.close >= upper_band].index
# Set the number of periods to consider for potential corrections
num_periods_after_overbought = 5
# Initialize an empty list to store potential correction datetimes
correction_datetimes = []
# Look for price reversals after overbought datetimes
for overbought_datetime in overbought_datetimes:
# Get the index of the overbought datetime in the DataFrame
overbought_index = bitcoin_data.index.get_loc(overbought_datetime)
# Check if the price falls below the 'Weekly Rolling Mean' within the specified number of periods
for i in range(1, num_periods_after_overbought + 1):
next_index = overbought_index + i
if next_index < len(bitcoin_data):
if bitcoin_data['close'].iloc[next_index] < bitcoin_data['Weekly Rolling Mean'].iloc[next_index]:
correction_datetimes.append(bitcoin_data.index[next_index])
break
# Plot Bitcoin prices
trace_price = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', name='Bitcoin Price', line=dict(color='blue', width=2))
# Plot the Weekly Simple Moving Average (SMA) as the middle band
trace_sma = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', name='Weekly SMA', line=dict(color='orange', width=2))
trace_upper_band = go.Scatter(x=bitcoin_data.index, y=upper_band, name = 'Upper Band',
fill='tonexty', fillcolor='rgba(128, 128, 128, 0.2)', line=dict(color='aliceblue'))
trace_lower_band = go.Scatter(x=bitcoin_data.index, y=lower_band, name = 'Lower Band',
fill='tonexty', fillcolor='rgba(128, 128, 128, 0.2)', line=dict(color='honeydew'))
# Create a trace for potential correction datetimes
trace_correction_datetimes = go.Scatter(x=correction_datetimes, y=bitcoin_data.loc[correction_datetimes, 'close'], mode='markers', name='Potential Correction Datetimes', marker=dict(color='red', size=10, symbol='circle'))
# Combine all traces into a data list
data = [trace_price, trace_sma, trace_upper_band, trace_lower_band, trace_correction_datetimes]
# Create layout
layout = go.Layout(title='Bitcoin Price with Bollinger Bands and Potential Correction Datetimes',
xaxis=dict(title='Date'),
yaxis=dict(title='Bitcoin Price', side='left', titlefont=dict(color='blue')),
yaxis2=dict(title='Weekly Rolling Mean', overlaying='y', side='right', titlefont=dict(color='orange')),
showlegend=True,
legend=dict(x=0, y=1, traceorder='normal'))
# Create the figure
fig = go.Figure(data=data, layout=layout)
# Show the plot
fig.show()
Uptrend and downtrend¶
Trading Signals: Uptrends and downtrends can serve as trading signals for traders and investors. For example, when the price is in an uptrend, it might be a signal to buy or hold the asset, while a downtrend could indicate a potential selling opportunity.
Risk Management: Understanding the trend direction can help with risk management. Traders might reduce their exposure to the asset during downtrends to minimize potential losses, while increasing exposure during uptrends to take advantage of potential gains.
Strategy Development: Uptrends and downtrends can be used as components in developing trading strategies. For instance, you could create a trend-following strategy that buys when an uptrend is confirmed and sells or shorts when a downtrend is confirmed.
Volatility Analysis: Analyzing trends can also provide insights into the market's volatility. Volatile markets may exhibit rapid and frequent changes in trend direction, while less volatile markets might have more stable and sustained trends.
Market Sentiment: Trends can reflect market sentiment and help gauge the overall bullish or bearish sentiment among traders and investors.
Pattern Recognition: By identifying trends, you can also look for patterns like higher highs and higher lows in uptrends and lower highs and lower lows in downtrends. Recognizing patterns can provide additional information for making trading decisions.
# Resample the data to weekly intervals and calculate the weekly mean
bitcoin_data_weekly = bitcoin_data.resample('W').mean()
# Initialize a boolean mask for uptrends and downtrends
is_uptrend = bitcoin_data_weekly['close'] > bitcoin_data_weekly['Weekly Rolling Mean']
is_downtrend = bitcoin_data_weekly['close'] < bitcoin_data_weekly['Weekly Rolling Mean']
# Use the boolean masks to label the trends (1 for uptrend, -1 for downtrend, and 0 for neutral)
bitcoin_data_weekly['Trend'] = 0 # Initialize the 'Trend' column with 0 (neutral)
bitcoin_data_weekly.loc[is_uptrend, 'Trend'] = 1
bitcoin_data_weekly.loc[is_downtrend, 'Trend'] = -1
bitcoin_data['3D Rolling Mean'] = bitcoin_data['close'].rolling(window=3*24).mean()
bitcoin_data['3D Rolling Std'] = bitcoin_data['close'].rolling(window=3*24).std()
# Resample the data to 3-day intervals and calculate the 3-day mean
bitcoin_data_3day = bitcoin_data.resample('3D').mean()
# Initialize a boolean mask for uptrends and downtrends
is_uptrend = bitcoin_data_3day['close'] > bitcoin_data_3day['3D Rolling Mean']
is_downtrend = bitcoin_data_3day['close'] < bitcoin_data_3day['3D Rolling Mean']
# Create the Bitcoin price line trace containing all data
price_trace = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', name='Bitcoin Price')
# Create the uptrend and downtrend traces using scatter plots based on 3-day resampled data
uptrend_trace = go.Scatter(x=bitcoin_data_3day.index[is_uptrend], y=bitcoin_data_3day['close'][is_uptrend],
mode='markers', name='Uptrend', marker=dict(color='green', symbol='triangle-up'))
downtrend_trace = go.Scatter(x=bitcoin_data_3day.index[is_downtrend], y=bitcoin_data_3day['close'][is_downtrend],
mode='markers', name='Downtrend', marker=dict(color='red', symbol='triangle-down'))
# Combine the traces
data = [price_trace, uptrend_trace, downtrend_trace]
# Create the layout
layout = go.Layout(title='Bitcoin Price with Uptrends and Downtrends (3-Day Resampling)',
xaxis=dict(title='Date'),
yaxis=dict(title='Bitcoin Price (USD)'),
showlegend=True,
)
# Create the figure and plot
fig = go.Figure(data=data, layout=layout)
fig.show()
Average True Range (ATR):¶
ATR is a technical indicator used to measure market volatility. It was introduced by J. Welles Wilder Jr. in his book "New Concepts in Technical Trading Systems." ATR calculates the average range between the high and low prices over a specific period, considering potential gaps between consecutive trading days.
ATR can provide valuable insights into the volatility of an asset, helping traders and investors make informed decisions. Higher ATR values indicate higher volatility, while lower values indicate lower volatility.
# Calculate Average True Range (ATR) for Bitcoin data
high_low_range = bitcoin_data['high'] - bitcoin_data['low']
high_close_range = abs(bitcoin_data['high'] - bitcoin_data['close'].shift())
low_close_range = abs(bitcoin_data['low'] - bitcoin_data['close'].shift())
true_range = pd.DataFrame({'HL Range': high_low_range, 'HC Range': high_close_range, 'LC Range': low_close_range})
ATR = true_range.max(axis=1).rolling(window=14).mean()
# Plot Bitcoin prices with Average True Range (ATR) on a secondary y-axis
fig, ax1 = plt.subplots(figsize=(10, 6))
# Plot Bitcoin prices on the primary y-axis (left side)
ax1.plot(bitcoin_data.index, bitcoin_data['close'], label='Bitcoin Price', color='blue')
ax1.set_xlabel('Date')
ax1.set_ylabel('Bitcoin Price', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
ax1.grid(True)
# Create a secondary y-axis for ATR (right side)
ax2 = ax1.twinx()
ax2.plot(bitcoin_data.index, ATR, label='Average True Range (ATR)', color='purple')
ax2.set_ylabel('Average True Range (ATR)', color='purple')
ax2.tick_params(axis='y', labelcolor='purple')
# Add a legend that combines both lines from both y-axes
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')
plt.title('Bitcoin Price and Average True Range (ATR)')
plt.grid(True)
plt.tight_layout()
plt.show()
Time-Series Modeling and Forecasting¶
We'll consider the ARIMA (AutoRegressive Integrated Moving Average) model, SARIMA (Seasonal ARIMA) model, and Facebook Prophet for forecasting. The goal is to train the models on a portion of the data and validate the performance on unseen data.
ARIMA (AutoRegressive Integrated Moving Average) Model¶
ACF: The AutoCorrelation Function measures the correlation between a time series and its lagged values. It helps to identify the level of autocorrelation in a time series at different lagged time points. The ACF plot shows the correlation coefficient at various lags. It typically decays over time, indicating a non-stationary time series.
PACF: The Partial AutoCorrelation Function measures the correlation between a time series and its lagged values, removing the effect of intermediate lags. It helps to identify the direct relationship between a time series and its lagged values. The PACF plot helps determine the order of the AR (AutoRegressive) component in an ARIMA model.
Using ACF and PACF plots, we can identify the order of the ARIMA model (p, d, q) as follows:
AR(p): The order of the AutoRegressive component (p) can be determined by looking at the PACF plot. The PACF plot will show significant spikes at lag points that indicate the direct relationship between the time series and its lagged values. The order of the AR component is usually the highest lag value with a significant spike before it starts to drop off.
I(d): The order of Integration (d) represents the number of differencing operations required to make the time series stationary. This can be determined by looking at the ACF plot. If the ACF plot shows a gradual decay or the series is already stationary, then d = 0. Otherwise, d is the minimum differencing required to make the series stationary.
MA(q): The order of the Moving Average component (q) can be determined by looking at the ACF plot. The ACF plot will show significant spikes at lag points that indicate the correlation between the time series and its lagged moving average values. The order of the MA component is usually the highest lag value with a significant spike before it starts to drop off.
By analyzing the ACF and PACF plots, we can determine the appropriate values for p, d, and q, which form the order of the ARIMA model to best capture the underlying patterns and autocorrelation in the time series data.
ACF and PACF Plots¶
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
# Plot the ACF plot
plot_acf(bitcoin_close_daily, lags=50, ax=ax1)
# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_close_daily, lags=50, ax=ax2, method='ywm')
plt.show()
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
# Plot the ACF plot
plot_acf(bitcoin_data_weekly['close'], lags=50, ax=ax1)
# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_data_weekly['close'], lags=50, ax=ax2, method='ywm')
plt.show()
As can be seen in the plots, the ACF value never reaches 0 when we resample the hourly data to daily, but reaches 0 and becomes negative when we resample to weekly. This behavior is expected and can be attributed to the impact of seasonality on the time series.
When we resample the hourly data to daily intervals, we are likely retaining the impact of intra-day patterns and fluctuations in the time series. As a result, the ACF values may not reach 0 since there could be some correlation between the data points within each day. In other words, the daily data still carries the memory of the previous hour's data, leading to non-zero ACF values.
On the other hand, when we resample the data to weekly intervals, we are aggregating the daily data over each week. By doing so, we are effectively removing the finer intra-day patterns and fluctuations, and the resulting weekly data might exhibit more apparent seasonality or periodicity. The ACF values may then reach 0 and even become negative due to the seasonality patterns that repeat at weekly intervals.
This behavior highlights the importance of understanding the inherent patterns and seasonality in the time series data before conducting ACF analysis. The choice of the resampling frequency can have a significant impact on the ACF results and the insights derived from them.
We are interested in capturing and analyzing seasonality, so the ACF and PACF plots of the resampled weekly data could provide valuable insights into the underlying periodic patterns and autocorrelation structure.
# Calculate first-order differencing (removing daily seasonality)
bitcoin_data['Differenced'] = bitcoin_data['close'].diff()
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
# Plot the ACF plot
plot_acf(bitcoin_data['Differenced'], lags=50, ax=ax1)
# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_data['Differenced'], lags=50, ax=ax2, method='ywm')
plt.show()
C:\Users\Behnaz\anaconda3\lib\site-packages\matplotlib\axes\_base.py:2503: UserWarning: Warning: converting a masked element to nan.
- Autoregressive (AR) Component (p):
- Look at the PACF plot and find the last significant spike before it drops to zero. The lag at which this spike occurs can give you an idea of the order of the AR component (p).
- If the PACF plot shows a significant spike at lag 1 and a gradual decay afterward, you might consider an AR(p) model with p=1.
- Moving Average (MA) Component (q):
Examine the ACF plot and find the last significant spike before it drops to zero. The lag at which this spike occurs can provide an indication of the order of the MA component (q).
If the ACF plot shows a significant spike at lag 1 and a gradual decay afterward, you might consider an MA(q) model with q=1.
- Differencing (d):
Look for the number of times you need to difference the data to make it stationary. This is the value of d.
If the differenced data shows a fairly stable mean and variance over time, d=1 may be sufficient. However, if it is still non-stationary, you may need to try d=2 or higher.
from sklearn.model_selection import train_test_split
from statsmodels.tsa.arima.model import ARIMA
bitcoin_daily = bitcoin_data['close'].asfreq('D')
# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False, random_state=42)
# Fit the ARIMA model
p, d, q = 28, 1, 1
# p, d, q = 29, 0, 1
arima_model = ARIMA(train, order=(p, d, q))
results_arima = arima_model.fit()
# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions_arima = results_arima.predict(start=start_index, end=end_index, dynamic=False)
# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_arima.index, predictions_arima, label='Predicted Prices', color='red')
plt.title('ARIMA Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
from sklearn.metrics import mean_absolute_error, mean_squared_error
mae_arima = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima)
rmse_arima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima))
print("ARIMA MAE:", mae_arima)
print("ARIMA RMSE:", rmse_arima)
ARIMA MAE: 4200.793708562146 ARIMA RMSE: 4572.152955434231
import warnings
# Supress all warnings
warnings.filterwarnings('ignore')
bitcoin_daily = bitcoin_data['close'].asfreq('D')
# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False)
# Define the parameter grid for p, d, and q
param_grid = {
'p': [27, 28, 29],
'd': [0, 1, 2],
'q': [0, 1, 2],
}
best_mae = float('inf')
best_params = None
# Iterate through the parameter grid
for p in param_grid['p']:
for d in param_grid['d']:
for q in param_grid['q']:
try:
# Fit the ARIMA model
arima_model = ARIMA(train, order=(p, d, q))
results = arima_model.fit()
# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions = results.predict(start=start_index, end=end_index, dynamic=False)
# Calculate MAE
mae = mean_absolute_error(test, predictions)
# Check if this combination of parameters gives a better MAE
if mae < best_mae:
best_mae = mae
best_params = (p, d, q)
except:
continue
print("Best MAE:", best_mae)
print("Best Parameters (p, d, q):", best_params)
Best MAE: 1701.255371960117 Best Parameters (p, d, q): (29, 0, 1)
# Fit the ARIMA model with the best parameters
best_arima_model = ARIMA(train, order=best_params)
best_results = best_arima_model.fit()
# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions_arima_tuned = best_results.predict(start=start_index, end=end_index, dynamic=False)
mae_arima_tuned = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima_tuned)
rmse_arima_tuned = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima_tuned))
print("ARIMA MAE Tuned:", mae_arima_tuned)
print("ARIMA RMSE Tuned:", rmse_arima_tuned)
ARIMA MAE Tuned: 1701.255371960117 ARIMA RMSE Tuned: 2054.649869380668
# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_arima_tuned.index, predictions_arima_tuned, label='Predicted Prices', color='red')
plt.title('ARIMA Model With Hyperparameter Tuning: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
SARIMA (Seasonal ARIMA) Model¶
SARIMA (Seasonal Autoregressive Integrated Moving Average) is an extension of the ARIMA (Autoregressive Integrated Moving Average) model that incorporates seasonality. It is a powerful time-series forecasting method that can handle data with both trend and seasonality. Seasonal Component: SARIMA introduces a seasonal component that captures repeating patterns in the data at fixed intervals. This is suitable for data with seasonality, such as monthly, quarterly, or yearly patterns.
Autoregressive (AR) Component: The autoregressive component captures the relationship between the current value of the series and its past values. It involves regressing the series against its own lagged values.
Integrated (I) Component: The integrated component refers to differencing the series to make it stationary. Stationarity is important for time-series models as it helps stabilize the mean and variance over time.
Moving Average (MA) Component: The moving average component models the relationship between the current value of the series and its past forecast errors (lags of the error term).
Parameters of SARIMA:
The SARIMA model is defined by three sets of parameters:
p, d, q: These parameters correspond to the autoregressive order (p), differencing order (d), and moving average order (q) of the non-seasonal part of the model.
P, D, Q, s: These parameters correspond to the autoregressive order (P), differencing order (D), moving average order (Q), and the length of the seasonal period (s) for the seasonal part of the model.
bitcoin_daily = bitcoin_data['close'].asfreq('D')
# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False, random_state=42)
# Define the order of the SARIMA model (p, d, q, P, D, Q, s)
order = best_params # Non-seasonal components (p, d, q)
seasonal_order = (0, 1, 1, 7) # Seasonal components (P, D, Q, s)
# Fit the SARIMA model
sarima_model = sm.tsa.SARIMAX(bitcoin_daily, order=order, seasonal_order=seasonal_order)
sarima_fit = sarima_model.fit()
# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions_sarima = sarima_fit.predict(start=start_index, end=end_index, dynamic=False)
# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_sarima.index, predictions_sarima, label='Predicted Prices', color='red')
plt.title('SARIMA Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
Checking for overfitting by making predictions on training data and calculating the MAE.
tp = sarima_fit.predict(start=1, end=len(train)-1, dynamic=False)
train_mae = mean_absolute_error(train[:-1], tp)
print("MAE on training data:", train_mae)
MAE on training data: 664.3603803557104
print("MAE on test data:", mean_absolute_error(bitcoin_daily[start_index:], predictions_sarima))
MAE on test data: 498.58609823816266
MAE on training data is not too small, so the model isn't overfitted.
# Set the number of periods you want to forecast
forecast_periods = len(test) # Or the number of periods you want to forecast
# Create a DataFrame with future dates for forecasting
last_date = bitcoin_daily.index[-1]
future_dates = pd.date_range(start=last_date, periods=forecast_periods + 1, freq='D')
future_dates = future_dates[1:] # Exclude the last date of the training data
# Generate forecasts for future dates
forecast_steps = len(future_dates) # Number of steps to forecast
forecast = sarima_fit.forecast(steps=forecast_steps)
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_sarima.index, predictions_sarima, label='Predictions', color='red')
plt.plot(forecast.index, forecast, label='Forecasts', color='green')
plt.title('SARIMA Model: Actual, Predictions, and Forecasts')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
Facebook's Prophet¶
Prophet is a time series forecasting model developed by Facebook's Core Data Science team. It is designed to handle time series data with strong seasonal patterns, multiple seasonality, and holiday effects. The model decomposes the time series into several components, including trend, seasonality, and holiday effects. It then models each component independently and combines them to make accurate forecasts.
from prophet import Prophet
bitcoin_prophet = bitcoin_data[['close']].reset_index()
bitcoin_prophet.rename(columns={'time': 'ds', 'close': 'y'}, inplace=True)
train_size = 0.8
train_prophet, test_prophet = train_test_split(bitcoin_prophet, train_size=train_size, shuffle=False, random_state=42)
model = Prophet()
model.fit(train_prophet)
# Set the number of periods you want to forecast
forecast_periods = 6*30*24
# Create a DataFrame with future dates for forecasting
future_dates = model.make_future_dataframe(periods=forecast_periods, freq='H')
# Make predictions for the future dates
forecast = model.predict(future_dates)
fig2 = model.plot_components(forecast)
09:41:35 - cmdstanpy - INFO - Chain [1] start processing 09:41:50 - cmdstanpy - INFO - Chain [1] done processing
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_prophet['ds'], bitcoin_prophet['y'], label='Actual Prices', color='blue')
plt.plot(forecast['ds'], forecast['yhat'], label='Predicted Prices', color='red')
plt.title('Prophet Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()
# Extract the predictions for the train set
train_predictions = forecast[forecast['ds'].isin(train_prophet['ds'])]
# Calculate the mean absolute error (MAE)
mae_train = mean_absolute_error(train_prophet['y'], train_predictions['yhat'])
# Calculate the root mean squared error (RMSE)
rmse_train = np.sqrt(mean_squared_error(train_prophet['y'], train_predictions['yhat']))
print("Mean Absolute Error (MAE):", mae_train)
print("Root Mean Squared Error (RMSE):", rmse_train)
Mean Absolute Error (MAE): 1553.683045298484 Root Mean Squared Error (RMSE): 1960.5377423900902
Model Performance Comparisons¶
from sklearn.metrics import mean_absolute_error, mean_squared_error
start_index = len(train)
end_index = len(bitcoin_daily) - 1
mae_arima = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima)
rmse_arima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima))
print("ARIMA MAE:", mae_arima)
print("ARIMA RMSE:", rmse_arima)
mae_arima_tuned = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima_tuned)
rmse_arima_tuned = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima_tuned))
print("ARIMA MAE Tuned:", mae_arima_tuned)
print("ARIMA RMSE Tuned:", rmse_arima_tuned)
mae_sarima = mean_absolute_error(bitcoin_daily[start_index:], predictions_sarima)
rmse_sarima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_sarima))
print("SARIMA MAE:", mae_sarima)
print("SARIMA RMSE:", rmse_sarima)
predictions_prophet = forecast[forecast['ds'].isin(test_prophet['ds'])]
mae_prophet = mean_absolute_error(test_prophet['y'], predictions_prophet['yhat'])
rmse_prophet = np.sqrt(mean_squared_error(test_prophet['y'], predictions_prophet['yhat']))
print("Prophet MAE:", mae_prophet)
print("Prophet RMSE:", rmse_prophet)
ARIMA MAE: 4200.793708562146 ARIMA RMSE: 4572.152955434231 ARIMA MAE Tuned: 1701.255371960117 ARIMA RMSE Tuned: 2054.649869380668 SARIMA MAE: 498.58609823816266 SARIMA RMSE: 686.8447388467636 Prophet MAE: 1855.2041767475887 Prophet RMSE: 2310.2446769137327